Replicability crisis in Ecology too?


  • High false positive rates (type I error) in Ecological studies

  • A non-intentional cause is failing in checking model’s assumptions

  • Researchers don’t check models -> higher chances of false conclusions

  • Few tools for model diagnostics of GLMs, GLMMs.



Can you trust your model?

Dispersion problems in count data


  • Example count data:

    • Species richness
    • Abundance of individuals
  • Modeling with Poisson GLMs/GLMMs


UNDER or OVERDISPERSION:

When data has more or less variability than expected by the distribution used for modeling.

GOALS


  • Aware ecologists of dispersion problems with count data


  • Identify and describe the 3 main causes by using model diagnostics tools with the DHARMa R package


  • Show some modeling solutions for these causes, with the glmmTMB R package

3 causes of dispersion problems

“Real” overdispersion:

Abundances vary more that expected by the model, in general.

Heteroscedasticity:

Zero-inflation:

3 causes of dispersion problems

“Real” overdispersion:

Abundances vary more that expected by the model, in general.

Heteroscedasticity:

Abundances variation increases with the environmental gradient.

Zero-inflation:

3 causes of dispersion problems

“Real” overdispersion:

Abundances vary more that expected by the model, in general.

Heteroscedasticity:

Abundances variation increases with the environmental gradient.

Zero-inflation:

More zero abundances than expected by the model.

Consequences of dispersion problems


OVERDISPERSION

  • Too small standard error / confidence intervals

  • Larger chance of false positive results


Wrong estimates, especialy if ignoring other processes (e.g. zero-inflation causes) in your data-generating process.


Missing the opportunity to learn more from your data. Ecological meanings for modeling unexpected variability.

Residual diagnostics with DHARMa

  • Scaled quantile residuals -> Simulating from the model

  • Residuals between 0 and 1 for ANY model complexity or distribution

  • Interpreted the SAME way:

If your model is correctly specified, i.e. your have the “data-generating process”, scaled quantile residuals will present a uniform “flat” distribution between 0 and 1.

Detecting “real” overdispersion

Wrong model

m <- glmmTMB(observedResponse ~ Environment1 + (1|group),
        family = poisson(), data = overData)
res <- simulateResiduals(m)
plot(res)
testDispersion(res)

Dispersion = 5.19, p-value = 0.

Detecting “real” overdispersion

Wrong model

m <- glmmTMB(observedResponse ~ Environment1 + (1|group),
        family = poisson(), data = overData)
res <- simulateResiduals(m)
plot(res)
testDispersion(res)

Dispersion = 5.19, p-value = 0.

Solution

m <- glmmTMB(observedResponse ~ Environment1 + (1|group),
        family = nbinom2(), data = overData)
res <- simulateResiduals(m)
plot(res)
testDispersion(res)

Dispersion = 1.19, p-value = 0.224.

Detecting heteroscedasticity

Wrong model

m <- glmmTMB(observedResponse ~ Environment1 + (1|group),
        family = poisson(), data = overData)
res <- simulateResiduals(m)
plotResiduals(res, form = data$Environment1,
              absoluteDeviation = T)
testDispersion(res)

Dispersion = 1.9, p-value = 0.

Detecting heteroscedasticity

Wrong model

m <- glmmTMB(observedResponse ~ Environment1 + (1|group),
        family = poisson(), data = overData)
res <- simulateResiduals(m)
plotResiduals(res, form = data$Environment1,
              absoluteDeviation = T)
testDispersion(res)

Dispersion = 1.9, p-value = 0.

Solution

m <- glmmTMB(observedResponse ~ Environment1 + (1|group),
             dispformula = ~ Environment1, # dispersion formula
        family = nbinom2(), data = data) # but needs negative binomial
res <- simulateResiduals(m)
plotResiduals(res, form = data$Environment1,
              absoluteDeviation = T)
testDispersion(res)

Dispersion = 1.11, p-value = 0.44.

Detecting zero-inflation

Wrong model

m <- glmmTMB(observedResponse ~ Environment1 + (1|group),
        family = poisson(), data = overData)
res <- simulateResiduals(m)
plot(res)
testZeroInflation(res)

Zero-inflation = 5.16, p-value = 0.

Detecting zero-inflation

Wrong model

m <- glmmTMB(observedResponse ~ Environment1 + (1|group),
        family = poisson(), data = overData)
res <- simulateResiduals(m)
plot(res)
testZeroInflation(res)

Zero-inflation = 5.16, p-value = 0.

Solution

m <- glmmTMB(observedResponse ~ Environment1 + (1|group),
            ziformula = ~ 1,  # zero-inflation formula
        family = poisson(), data = data) 
res <- simulateResiduals(m)
plotResiduals(res)
testZeroInflation(res)

Zero-inflation = 1, p-value = 1.

Detecting dispersion problems


  • Residual patterns alone will not tell you which is the cause of overdispersion. E.g.:

    • ‘Real’ overdispersion will show significant test for zero-inflation, and vice-versa.

    • ‘Real’ overdispersion and zero-inflation may have significant heteroscedasticity.


  • Additional check: fit models addressing the potential problems and compare their fit (e.g. AIC, LRT) and residuals diagnostics.

Don’t always assume the most complex/complicated model is the correct one!

Conclusion


  • There are many causes of dispersion problems in GLMMs


  • Use DHARMa residuals tools to detect them


  • Address the problem with adequate models, e.g, glmmTMB

Take-home message

  • Models should ALWAYS be checked: residual diagnostics!


  • Avoid an oversimplistic view of dispersion problems


  • Detecting and addressing the causes of dispersion problems may also be informative for your system/data.


Comming soon:

Leite et al. in prep. Dispersion tests in GLMMs: a methods comparison and practical guide.